written report called act_report.pdf or act_report.html on the insights and visualisation(s) made from the wrangled data (250+ words)
The original prediction data had 3 guesses at dog type and dog/not dog for each tweet's image, in decreasing order of confidence. From these 3 predictions (dog/not dog, dog/item type), pick the most confident guess that is a dog type (confidence of p1 > p2 > p3). If none of the 3 predictions are dogs, then use the first prediction.
Even with WeRateDog's rating systems where the numerator can be higher than the denominator, we can still normalise the scores by calculating the score = rating numerator / rating denominator.
We see some outliers of calculated rating scores (420/10, 666/10, 182/10, 1776/10, 75/10) which are likely memes/references (420, 666, 1776). Checking some of these images, one is not even a dog.
The 99% percentile of calculated rating scores lie between [0.2, 1.4], and the so we remove outliers (score > 3) which were way outside this range
The outlier rating scores (7.5, 18.2, 42.0, 42.0, 66.6, 177.60) were removed on the left plot. The remaining scores are clustered around 1.0 (10/10), though there are many scores > 1.0 due to WeRateDog's rating system.
The right plot is a log scale histogram that shows the removed outliers were outside the main cluster around ${10^0}$ (1.0). The outliers are not visible since they are single counts compared to the 50+ counts for the more common scores.
From the side-by-side histograms, we see that there are more lower scores (< 1.0) for tweets with pictures predicted to be non-dogs (right plot), compared to tweets with pictures predicted to be dogs (left plot).
From the boxplot, we can see that when the image is predicted not to be a dog, the mean is at 1.0 and the 25-75% percentiles are lower than when the image is predicted to be a dog, which have mean > 1.0
From these side-by-side scatterplot comparisons, there appears to be a weak-moderate positive correlation between Rating Score, and Favorite Count/Retweet Count. There is a strong positive correlation between Favorite Count and Retweet Count.
Plotting both retweet count and favorite count over time, we see that retweet counts very slightly increase, wherease favorite counts increase more moderately over time.
Plotting the rating scores over time, we see that most of the lower rating scores tended to occur in past tweets (before 2016-10)
From the side-by-side box plots of the 4 dog stages (Doggo, Floofer, Pupper, Puppo), we see that Doggos and Floofers tend to have higher rating scores than Non-Doggos and Non-Floofers respectively. Puppers have about the same scores as Non-Puppers. Puppos have much higher scores than Non-Puppos.
Breeds with best rating scores are: clumber, Bouvier_des_Flandres, Saluki, Pomeranian, briard. Worst rated dog types are: Japanese_spaniel, soft-coated_wheaten_terrier, Scotch_terrier, Walker_hound, Tibetan_terrier
Let's check some pictures with the highest ratings (> 1.3):
Now some pictures with the lowerst ratings (< 0.2):
5 best scoring pictures that were not predicted to be dogs:
For some reason, some non-dog categories such as 'pole' and 'dough' have high rating scores. Checking these images, the actual dogs are small in the pictures, or the picture is of something that looks like a dog, which is why they were misclassified.
Pictures with the most retweet counts:
Pictures with the most favourite counts:
From the repetitions, it makes sense that pictures with the highest retweet counts would also have the highest favorite counts.